A simple data discretizer

نویسندگان

  • Gourab Mitra
  • Shashidhar Sundareisan
  • Bikash Kanti Sarkar
چکیده

Data discretization is an important step in the process of machine learning, since it is easier for classifiers to deal with discrete attributes rather than continuous attributes. Over the years, several methods of performing discretization such as Boolean Reasoning, Equal Frequency Binning, Entropy have been proposed, explored, and implemented. In this article, a simple supervised discretization approach is introduced. The prime goal of MIL is to maximize classification accuracy of classifier, minimizing loss of information while discretization of continuous attributes. The performance of the suggested approach is compared with the supervised discretization algorithm Minimum Information Loss (MIL), using the state-of-the-art rule inductive algorithmsJ48 (Java implementation of C4.5 classifier). The presented approach is, indeed, the modified version of MIL. The empirical results show that the modified approach performs better in several cases in comparison to the original MIL algorithm and Minimum Description Length Principle (MDLP) .

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Novel Discretizer for Knowledge Discovery Approaches Based on Rough Sets

Knowledge discovery approaches based on rough sets have successful application in machine learning and data mining. As these approaches are good at dealing with discrete values, a discretizer is required when the approaches are applied to continuous attributes. In this paper, a novel adaptive discretizer based on a statistical distribution index is proposed to preprocess continuous valued attri...

متن کامل

Incremental Discretization and Bayes Classifiers Handles Concept Drift and Scales Very Well

Many data sets exhibit an early plateau where the performance of a learner peaks after seeing a few hundred (or less) instances. When concepts drift slower than the time to find that plateau, then a simple windowing policy and an incremental discretizer lets standard learners like Naı̈veBayes classifiers scale to very large data sets. Our toolkit is simple to implement, can scale to millions of ...

متن کامل

A Bayesian Discretizer for Real-Valued Attributes

Discretization of real-valued attributes into nominal intervals has been an important area for symbolic induction systems because many real world classiication tasks involve both symbolic and numerical attributes. Among various supervised and unsupervised discretization methods, the information gain based methods have been widely used and cited. This paper designs a new discretization method, c...

متن کامل

Data discretization: taxonomy and big data challenge

Discretization of numerical data is one of the most influential data preprocessing tasks in knowledge discovery and data mining. The purpose of attribute discretization is to find concise data representations as categories which are adequate for the learning task retaining as much information in the original continuous attribute as possible. In this article, we present an updated overview of di...

متن کامل

Geometry and surface - assisted micro flow discretization ∗

This paper presents a micro flow discretization system that autonomously digitizes continuous liquid flow into nanoliter segments. Powered by the interactions between liquid flow and micro-channels, the discretization process does not consume any electricity or require any external control. In the prototype demonstration, the discretizer is made of PDMS microfluidic channels with desired geomet...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1710.05091  شماره 

صفحات  -

تاریخ انتشار 2017